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1 Introduction 

The problem of fitting an event distribution when the total expected number of events is 
not fixed, keeps appearing in experimental studies. Peelle’s Pertinnent Puzzle (PPP) notes 
that in a y 2 fit, if overall normalization is one of the parameters parameters to be fit, the 
fitted curve may be seriously low with respect to the data points, sometimes below all of 
them. This puzzle was the subject of a NIM article by G. D’Agostini (NIMA 346 (1994) 
306). This problem and the solution for it are well known within the statistics community, 
but, apparently, not well known among some of the physics community. The purpose of 
this note is didactic, to explain the cause of the problem and the easy and elegant solution. 

The solution is to use maximum likelihood (ML) instead of y 2 . The essential difference 
between the two approaches is that ML uses the normalization of each term in the y 2 
assuming it is a normal distribution, l/\/27rcr 2 . In addition, the normalization is applied 
to the theoretical expectation not to the data. In the present note we illustrate what goes 
wrong and how maximum likelihood fixes the problem in a very simple toy example which 
illustrates the problem clearly and is the appropriate physics model for event histograms. 
We then note how a simple modification to the y 2 method gives a result identical to the 
ML method. I will also discuss the models in G. d’Agostini’s article (p. 309) and add one 
more. 

2 Toy Model—y 2 

Consider a simple data set with only two bins. Theory predicts that the expected value 
of N, the number of events in the bin should be the same for each bin, and that the bins 
are uncorrelated. Let x\ and X 2 be the number of events experimentally found in the two 
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bins. The variance (a 2 ) is N for each bin, (a = ■s/N). 
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We want to find the minimum, = 0. Call term 1, the derivative with respect to the 
numerators of the x 2 - 


Term 1 = 2 ("-* + "-*■) = 2(1 _ |) + 2(1 _ |), 
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If we ignore the derivative of the denominator, then Term 1 = 0, is solved by N = —— 
Call this the naive solution. 

Call Term 2 the derivative with respect to the denominator of the x 2 


Term 2 = — 
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Term 2 is negative and 0(1/N). The only way that Term 1 + Term 2 = 0 is for Term 1 to 
be positive. This means that the x 2 solution must have N greater than the naive value. 
Although Term 1 is 0(1), x\/N and x 2 /N are 0(1/N). N is pulled up as the fit wants to 
make the fractional errors larger. (Had the normalization been put into the data not the 
theoretical value, the fitted curve would have been low.) 

3 Toy Model—Maximum Likelihood 

The likelihood (. C ) is the probability density function for the two bins assuming each bin 
has a normal distribution. (This requires N is not too small). 


£ = 


\/2vr a 2 \Z2 tut 2 


e -(N-xi) 2 /(2a 2 ) e -(N-x 2 ) 2 /(2(T 2 )' 


For a 2 = N, the log of the likelihood is: 

In C = — ln(27r) — In N — x 2 /2. 

Let Term 3 be the derivative of the normalization. 

Term 3 = — —. 

N 

The derivative of the ln£ is Term 3 — (Term l)/2 — (Term 2)/2. 
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Term 3 — (Term 2)/2 = —— + 
' N 


{N - Xl ) 2 + (N — x 2 ) 2 -2 N + (N - xi) 2 + {N - x 2 ) 2 
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Since the expectation value E(N — x\) 2 = E(N — X 2) 2 = N , the expectation value of Term 
3 - (Term 2)/2 =0. For fitted values a modification is needed. Assume that there is only 
one overall normalization factor and assume now that there are rib bins. The expectation 
value for a y 2 with rib bins and rif fitted parameters is rib — nj. This occurs because, 
after fitting, the multidimensional normal distribution loses rif variables. This means, for 
rib = 2, rif = 1, the value of Term 2is2xl/2 = l. The same loss in dimensions requires 
term 3, the normalization term of the multidimensional distribution to be multiplied by 
(rib — rif)/rib to match the change in y 2 since the fit has integrated over those variables. 
The change in expectation value occurs automatically in the fit, but the modification to 
Term 3 must be put in by hand. 

There is an easy general way to handle this problem. The problem arises because the 
error matrix is a function of normalization. When the simple y 2 method is applied, the 
derivative of the y 2 is in error because the change in the normalization of the particle density 
function is not taken into account. Including this term in the ML approach eliminates the 
problem. This leads to a simple approach using a modified y 2 analysis. Consider rib bins 
and g fitting parameters pj. Let rq (p\,p 2 , • • • ,P g ) be the expected number of events in bin 
i. The distribution of experimental events in each bin is taken as approximately normal. 
The total number of events in the histogram is not fixed. Choose the set rq as the basis. 
The error matrix is diagonal in this basis. Ignoring the 2r r constants: 
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d ln£ Xi — rii 1 ..(an — rq) 2 . , . 

*r = — + ^[< L V L) - 11 - <•> 

The expectation value for the term in square brackets is zero. Recall that the expectation 
refers to the average value over a number of repetitions of the experiment. It is X{ that 
changes with each experiment not the theoretical expectation, rq. The expectation value 
of the term in square brackets will remain zero even if it is multiplied by a complicated 
function of the pj fitting parameters. Ignoring this term leads to: 


5 In C 

d Pj 


£< 


Xj-rij dnj 
rii ’ dpj' 
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By expressing the rii as the appropriate functions of the pj , the error matrix can be written 
in terms of the pj. However, the derivative of the inverse error matrix does not appear in the 
transform of Equation 10. This result means that one can use a modified y 2 approach. Use 
the usual y 2 , but, when derivatives are taken to find the y 2 minimum, omit the derivatives 
of the inverse error matrix. The result is identical to the result from ML. The modified y 2 
method should be generally used in place of the regular y 2 method. 
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In practice, since the differences are not precisely the expectation values for a given 
experiment, there is a small residual higher order effect, which causes no bias on the 
average. 


4 Review of G. D’Agostini’s models 


The problem he discusses is a bit different than that treated in the toy model. He imagines 
that we have two measurements of the same physical quantity, but that there is a possible 
scale error / and a best value k of two measurements, x\ and X 2 to be fit. The models 
presented by D’Agostini can be written in the form: 

v 2 _ (M-fc) 2 ( fx 2 -k ) 2 (/- l) 2 _ jxi -k/f) 2 (x 2 -k/f) 2 (/- l) 2 . 

11 /"er 2 f n ( 7 2 O' 2 f n ~ 2 &i f n - 2 °2 a / 

He treats the cases n=2 (Model A) and n=0 (Model B). We will also discuss the case 
n = — 1. D’Agostini finds that n = 2 does not exhibit PPP, but n = 0 does exhibit it. 

There are two errors in the method of D’Agostini, which we have already mentioned in 
the previous section. 

• The use of the y 2 distribution incorrectly ignores the changes of normalization of the 
multidimensional density distribution as the normalization parameter is changed. 


• The normalization parameter N should be included in the theoretically expected 
value, not in the data value. The experimentally observed number of events is what 
it is. D’Agostini’s / = 1/N. This has two effects. The first effect is that the 
normalization dependence of the error matrix is changed. The second effect is that 
the average of N is not the same as the average of 1/N. 


First consider the ML solution. Using N as normalization, 


2 _ (xi - Nk) 2 (x 2 - Nk) 2 (N - l) 2 
X n ro z: o T w to r: o T 


N 2 ~ n af N 2 ~ n a 2 


’2 a N 

It is assumed here that a/j is a fixed number, rather than having crj fixed. Let 

,2 (N - l) 2 
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X = X 
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The derivative of the numerator of % 2 with respect to N is: 

2(Nk-xi) 2{Nk-x 2 ) 2{N — 1) 
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N 2 ~ n af N 2 ~ n a 2 
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The derivative of the denominator is: 


n — 2 
N 


2 * 
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For ML the N dependent part of the normalization term is (l/V N 2 ~ n ) 2 . The log of this 
term is —(2 — n) In N and the derivative of the log with respect to N is (n — 2 )/N. For ML 
then: 


<9ML _ n — 2 \ 2{Nk-x{) 2(Nk-x 2 ) 2(N - 1) (n - 2) 2 * 

~dW ~ N ~ 2 1V2=^2 + N 2 - n a 2 + + at X 


(16) 


Here, the expectation value of the x 2 * term is 1 after fitting and the normalization term 
is reduced to (n — 2)/(2N) to account for the loss of a degree of freedom. For any n, the 
ML normalization term cancels the expectation value of the denominator derivative. 

Next look at this using D’Agostini’s calclulation. For any n value, the derivative with 
respect to k is: 


dxl 2 Ak/f-x i) (kj f — x 2 ) 

dk f n ~ l o\ a% 


(17) 


Hence, 


k = f(^ + ^)/( 

a i 


°2' 


+ T2 ) 
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which is the expected result from combining two measurements of the same quantity, except 
for the factor /. Define the result for / = 1 to be x. 


— / x i . x 2 \ // 1 j 1 

X - (~2 + -o)/{-o + —o 
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Note that for xjf? the derivative of the numerators of the first two terms together (using 
Tyr ■ + ) has been determined to be zero from the derivative. 


4.1 n = 2, Model A 

Using the result from the derivative with respect to k, it is seen that for the derivative with 
respect to /, (using the 2nd expression in Equation 11 with f n ~ 2 = 1 in the denominator), 
the derivatives of the first two terms add to be zero from the result of the derivative with 
respect to k seen in Equation 17, and then / is forced to be 1. D’Agostini finds that this 
does not have a PPP problem as expected since the variance is independent of /. 
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4.2 n = 0, Model B 
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_ or (A;-/®i) , r>i(k-fx 2 ), 


dk 


= 2 [- 
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Here, / will not be one. Using the result from the partial derivative with respect to k. Xb 
can be written: 


9 Xb _^, 2 ,{xi-xY (x 2 -x)^ (f- 1 ) 


df 


= 2 / 
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-] + 2 - 


a l °2 a f 

2r 1 | ( X!~X ) 2 , (x 2 x ) 2 1 
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/ *1 (72 

r i /n , _2/(*l -^) 2 , (®2 -®) 2 m 

/ = l/i 1 + ^/(-^2- h -12-)J- 


7 “ ^ + 
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— . Xl X 2 . , , 1 1 \ 

X! - X = Xi ~ (-j + -2)/(-2 + —) = 


.Tl - X 2 


Similarly, 


a 2 a 2 a 2 a 2 cr 2 (l/af + 1/a 2 )' 
x 2 — X\ 


X 2 - X = 


(7 2 (l/c7 2 + l/(7 2 2 )' 


To find /, consider: 

(xi — x) 2 (x 2 — x) 2 

n I n 


(xi - x 2 ) 2 
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(xi - x 2 y 
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(TX 


_ (xi - x 2 y 

a 2 al(l/a 2 + a 2 ) 2 ' aja 2 (l/a 2 + a 2 ) 2 “ a 2 + a 2 


f = 


1 + (7 2 (xi - x 2 ) 2 /(erf + a|)' 

/ is always less than one. This is the result obtained by D’Agostini. 
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4.3 n = —1, the Toy Model 


d xl=-i 


Use the notation of D’Agostini. Again the first two terms of ^ 1 are zero 
dx 2 n =-i _ lAfxi-k) 2 {fx 2 -k) 2 2(/-l) 


df 


r 


+ 


f- x 4 


-] + 


(71 


The expectation value of the first two terms is j. 


d xl=~i 

df 


2 2(/-l) 

7 4 


(28) 


(29) 
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This will be far from / = 1, unless aj « 1. However, the ML term is j. 

dln£ w= _i _ 1 _ xl=-i (/ - 1) 

df f 2 ~ / / a} ' 1 J 

For the ML method, / = 1. 

5 Summary 

The PPP problem arises because the y 2 method incorrectly ignores the normalizations 
of the multidimensional probability density functions when the total expected number of 
events is not fixed. For an event histogram the maximum likelihood method is correct if: 

• Errors are taken as the square root of the theory model; they are not to be taken as 
the square root of the number of events in the bin. 

• The normalization factor is included with the theory model. 

• The subtraction for noise is included with the theory model.The data is the number 
of events obtained experimentally. All corrections belong to the theory model. 

This ML result is completely equivalent to a modified y 2 approach. Use the usual y 2 , but, 
when derivatives are taken to find the y 2 minimum, omit the derivatives of the inverse 
error matrix. 
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